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I.  Introduction 

This  report  is  one  of  two  reports  dealing  with  some  point  estimators 
and  their  efficiencies  for  seme  common  probabilistic  settings.  The  companion 
report  [15]  is  essentially  an  application's  report  and  contains  much  numer- 
ical work  on  the  asymptotic  efficiencies  of  some  estimators  that  are,  in 
essence,  found  by  modification  of  the  method  of  moments.  The  present  report 
provides  conceptual  background  and  some  computational  formulae. 

The  present  report  is  also  a working  paper  in  which  old  material  is 

presented  in  a way  that  reflects  my  own  interests  in  terms  of  providing  a 

base  for  generalization  and  expansion.  Section  II  contains  an  overview  of 

the  Cramer-Rao  type  lower  bounds  for  single  and  multiparameter  problems. 

Hodges*  proof  [11]  of  Cramer’s  basic  result  [4]  is  included  in  order  to 

expose  the  structure  of  what  happens.  Avenues  of  extension  are  indicated. 

In  Section  III  is  presented  the  concept  of  efficiency  and  how  it  extends 

to  multiparameter  problems.  Although  Lemma  3.1  is  known  [ 1 7 , p.  378],  a 

* 

proof  of  it  seems  hard  to  locate.  The  proof  by  Dan  Davis  is  presented, 
again  to  expose  structure  and  for  possible  generalization. 

The  material  in  Sections  IV  and  V is  believed  to  be  new.  It 
deal6  with  the  characterization  of  certain  covariance  matrices  and  applies 
the  results  to  a notion  of  "directional  efficiency."  This  development  is 
motivated  by  the  fact  that  although  maximum  likelihood  estimates  are 
asymptotically  efficient,  they  are  often  extremely  hard  to  find.  That  is, 
the  system  of  equations  is  difficult  to  solve.  Often  the  fault  lies  in  a 
few  of  the  equations  in  the  system  (rather  than  all) . These  few  can  be 
replaced  in  such  a way  that  the  resulting  system  is  more  easily  solved. 

* 

Department  of  Mathematics,  USNPS 

1 


The  resulting  estimator  will  have  a covariance  matrix  related  to  the 
information  matrix  and  this  relationship  is  characterized  in  Theorem  4.1. 
The  loss  of  efficiency  is  related  to  both  the  number  of  equations  replaced 
and  to  the  quality  of  the  replacements.  The  formulae  expose  the  nature  of 
this  division.  It  is  shown  that  there  is  no  loss  of  efficiency  in  a sub- 
space of  the  parameter  space. 


II.  Lower  Bounds  for  the  Variance 

Consider  two  sets  of  random  variables  S,,...,S,  and  T, , . . . ,T  , 

1 k 1 q 

the  S's  being  linearly  independent  almost  surely.  That  is,  there  exists 
no  set  of  constants  such  that  P{a^S 

except  for  all  the  a's  equal  to  zero.  Let  A be  the  covariance  matrix 
of  S^,...,Sk  and  it  follows  that  A is  positive  definite.  There  will 
be  no  loss  in  assuming  that  each  of  the  S's  has  expectation  zero.  Let 
B be  the  covariance  matrix  of  the  T's  and  let  N be  the  k by  q 
matrix  of  cross  covariances  between  the  S's  and  T's. 

The  following  lemma  is  Hodges'  version  [11]  of  Cramer's  result 

[4]. 

Lemma  2.1.  The  matrix  B - N'A  is  non-negative  definite. 

Proof : Let  u be  any  p-vector  and  v be  any  q vector.  By  the  Schwarz 

Inequality 

{Cov(u'S,  v'T)}2  _<  Var(u’S)  Var(v'T)  (2.1) 

where  the  prime  denotes  matrix  tranpose.  Alternatively  (2.1)  may  be  written 


+ akSl< 


0}  = 1 


2 


(2.2) 


t 


(u'Nv)2  <_  u'Auv'Bv 

Set  u'  = v'N'A  ^ into  (2.4)  and  obtain 

(v'N'A  ^Nv)^  < (v'N'A  Hjv)(v'Bv)  (2.3) 

— 

Since  A is  positive  definite  the  number  v'N'A  Nv  is  >_  0.  If  it  is 
positive,  one  can  divide  and  obtain 

v'N'A  ^Nv  <_  v'Bv  (2.4) 

and  since  B is  non-negative  definite,  the  inequality  is  true  also  when 
the  left  member  is  zero.  Hence  for  all  v we  have 

v'  (B  - N'A-1N)v  >_  0 (2.5) 

as  required. 

The  lemma  has  rather  broad  value  in  providing  lower  bounds . To 
illustrate  the  common  usage,  we  introduce  the  setting  for  a case  of  regular 
point  estimation.  Let  the  parameter  space  0 be  an  open  subset  of  p-dimen- 
sional  Euclidean  space  and  let  the  population  sampled  have  a (generalized) 

density  function  f(x;9),  9 e 0.  Given  a sample  x.^ x^  of  size  n 

the  likelihood  function  will  be  denoted  by 

L(x;0)  = f(x1;9)  f(x2;9),...,  f(xn;9)  . (2.6) 

Moreover,  the  quantities 

sf(x)  = 3 In  f(x;9)/39f  , r * l,...,p  (2.7) 

s (x)  = 3 2 In  f(x,9)/39  39  , r,j  - l,...,p  (2.8) 

**  * J ^ J 

3 


are  assumed  to  exist. 


The  setting  for  a case  of  regular  point  estimation  requires 
assumptions  concerning  the  ability  to  differentiate  under  the  integral 
sign.  Following  Wilks  [17]  we  assume  (using  F(x)  = J f(u;0)  du) 

— oo 

E{sr(X)}  = ai“  / dF(x;0)  = 0;  r = 1 (2.9) 

r 


E(sr<»  SJ<X»  +E(sr,J<X»  -967157 


/ dF(x;0)  = 0 


(2.10) 


r >3  — 1» • • • »P 


From  (2.9)  and  (2.10)  it  follows  that  the  covariance  matrix  of  the  (sr(X)} 
can  be  calculated  from 

Cov{s  (X),  s . (X) } = -Efs  .(X)}  (2.11) 

r J r > J 

The  symbol  A will  be  reserved  for  this  matrix.  It  is  the  (Fisher) 
Information  matrix. 

A common  choice  for  S^,...,S^  of  Lemma  (2.1)  is 
n 

S - I s (X  ) , r = l k<p  (2.12) 

i=l  l 

Then  the  S's  are  also  functions  of  0.  The  linear  independence  assumption 
and  hence  the  positive  definite  nature  of  A must  hold  for  almost  all  0, 
and 

A = nA  (2.13) 
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The  random  variables  T^,...,T^  are  statistics  and  suppose  they  are  used 
to  estimate  0^,...,0  . If  we  choose  not  to  require  that  the  T's  be  un- 
biased estimators,  then  we  should  characterize  the  risk  matrix 


R = E{ (T-9) (T-6) '}  = B + b 


(2.14) 


where,  in  (2.14),  0'  = (0^,...,0  ) and  b is  the  matrix  of  products  of  bias 


b = (y-0) (y-0) ' 


(2.15) 


using  y = E(T).  Since  b is  obviously  non-negative  definite  it  follows 
that  Lemma  2.1  could  have  stated  that  R - N'A  is  non-negative  definite, 
but  this  is  not  as  sharp. 


First  Application.  Let  p = q = k = 1 and  assume  that  T is  a regular 
estimate  for  0 in  the  sense  that  E(T)  can  be  differentiated  under  the 
integral  sign.  Then 


N = E(TS)  = / T(x)  ln3Q-^-;~ L(x;0)  dx 


In  IT(X>  L(x;0)  dx  = |t-  E(T)  = 1 + b'(9)  (2.16) 


Then,  applying  Lemma  2.1  and  using  (2.13)  and  (2.16)  provides  the  familiar 
lower  bound  for  the  variance  of  T 


4 > ^ 

T — nA 


(2.17) 


There  are  common  examples  such  that  cT  equals  the  lower  bound.  Then  T 
can  be  said  to  be  the  best  regular  estimator. 
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Second  Application.  Suppose  q = 1,  k £ p,  and  T is  intended  to  estimate 
6^.  Also  T is  regular  in  the  sense  that  it  can  be  differentiated  under 
the  integral  sign  with  respect  to  each  Thus,  the  rth  element 

of  N is 

\ = E{TSr>  = J T(x)  dx 

r 

= — J T(x)  L(x;8)  dx  = 6lr  + br(0)  , r = l,...,k  . (2.18) 

r 

where  6.  is  the  Kronecker  delta  and  b is  the  partial  derivative  of  the 
lr  r 

bias  function  with  respect  to  9^.  The  resulting  inequality  is 

°X  - n'a~1n  (2.19) 

where  A is  the  upper  left  k by  k corner  of  nA.  This  provides  a lower 

bound  for  the  variance  (risk)  of  an  estimator  for  9^  when  there  are 

nuisance  parameters  present.  Moreover  the  bound  is  nondecreasing  as  k 

is  increased.  This  has  importance  because  it  affects  the  sharpness  of  the 

bound.  Let  us  justify  this  point  by  drawing  attention  to  relationships 

with  the  multiple  correlation  coefficient. 

Consider  the  projection  (in  L2)  of  T on  the  subspace  spanned  by 

1,  S^,...,S^.  The  mathematical  problem  is  to  choose  the  scalar  c arvi 

2 

the  k vector  8 so  as  to  minimize  E{T  - c - B'S)  . The  solution  is 
c = E(T)  = 9j,  + b ( 9)  and  S'  = N'A  The  projection  is 

T = E(T)  + N'A-1S  (2.20) 

and  easy  calculations  show  the  mean  square  error 
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F 


MSE  = E{T-T}2  = Var(T)  - N'A_1N 


(2.21) 


which  is  surely  nonincreasing  as  k increases.  Working  next  on  an  inner 
product  term  produces 


EUT-TXT-ep}  = E{  (T-T)  (b  + 0'S)} 


= B ’ E(ST)  - B'E(ST)  = B’N  - 0’ES(y  + B’S) 


= N’A-1N  - N ,A_1AA_1N  = 0 


(2.22) 


It  follows  that  the  orthogonal  decomposition 


E(T-61)2  = E{T-T}2  + EO-ep2 


(2.23) 


is  valid.  Similar  calculations  yield 


E{T-01}2  = Var(T)  + b2(0) 
= N'A-1N  + b2(0) 


(2.24) 


Using  (2.21)  and  (2.24)  allows  (2.23)  to  be  rewritten 


RT  = MSE  + b2(0)  + N'A_1N 


(2.25) 


Since  R^,  and  b(6)  are  fixed,  the  nonincreasing  feature  of  MSE  implies 
that  the  bound  N'A  N is  nondecreasing. 

The  quantity  p defined  by 


1 - P 


2 _ MSE  + b (6) 
Var(T)  + b2(0) 


(2.25) 


plays  a role  analogous  to  the  multiple  correlation  coefficients.  Clearly 


0 < p < 1. 
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Third  Application.  Suppose  p = q = 1 and,  following  Bhattacharyya  [2] 


we  define 


«,  = 1 3 L(x;6) 

r L(x; 0)  3Qr 


r = 1, . . . ,k 


(2.26) 


Now  the  regularity  conditions  must  be  extended  to  include  k differentiations 
under  the  integral  sign  so  that 


E(S  ) = — — 
r r 

ae 


J L(x;9)  dx  = 0 . 


(2.27) 


Also  E(T)  must  be  differentiated  under  the  integral  k times  so  we  can 
use 

N = E(TS  ) = — / T(x)  L(x;0)  dx 
30 

= -1-  (0  + b (0) ) 

30 

= 6lr  + b(r)(0)  (2.28) 


Further 


A . 
r! 


E j 1 3rL(s;9)  3jL(x;  ) 

L2(x;0)  30r  30^ 


(2.29) 


This  technique  can  produce  an  increasing  sequence  of  bounds  for  some  problems, 
and  a sequence  of  constants  for  others  [2,11]. 

The  preceding  description  is  more  general  than  one  usually  finds. 
Typically  the  T's  are  the  estimators  (hence  statistics).  The  S's  are 
functions  of  the  model.  The  bound  is  trivial  if  the  T's  are  not  correlated 
with  the  S's,  i.e.  N = 0.  Thus  there  is  a hierarchy  of  problems  which, 
in  vague  terms,  might  be  expressed  as  follows:  Given  S find  the  "best" 

T among  all  T's  correlated  with  S.  Having  characterized  this  problem  for 
each  choice  of  S,  find  the  "best"  S. 
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Some  progress  has  been  made  in  this  approach  to  finding  good 
estimators.  It  usually  appears  under  the  name  of  minimum  variance  unbiased 


regular  estimation.  The  bounds  are  sharp  in  some  of  the  more  popular 
settings.  An  example  is  included  in  a setting  for  which  sharp  bounds  are 
not  known. 


Example  (Geometric).  Consider  a sample  of  size  one  from  a geometric  density 


f (x;p) 


x = 0,1,2, .. . 


(2.30) 


This  is  a member  of  the  Darmois-Koopman  family.  It  is  known  111,  p.  2-21] 
that 


T = 1 if  X = 0 

=0  if  X > 0 


(2.31) 


is  the  only  unbiased  estimator  for  p.  Its  variance  pq  certainly  is 
uniformly  minimum  among  all  unbiased  estimators.  Applying  (2.17)  and  (A. 9) 
from  the  appendix  produces  the  classical  Cramdr-Rao  lower  bound  for  regular 
unbiased  estimators 


(2.32) 


Clearly  this  is  not  sharp  (<  pq) . The  margin  is  made  graphic  in  Figure  2.1. 

The  maximum  likelihood  estimator,  p = (1+x)  ^ is  biased.  Its 
variance  and  bias  function  are  developed  in  the  appendix,  see  (A. 4)  to  (A. 6). 
Its  risk  is 

R = o 2 (p)  + b2(p)  (2.33) 

and,  using  (2.17)  and  (2.32),  a lower  bound  for  the  risk  is 

9 


(2.34) 


Rg  = qp2(l  + b'(p))2  + b2(p) 


and  b'(p)  is  given  in  (A. 7). 


Both  R and  R^ 


are  included  in  Figure  2.1. 


Figure  2.1 

Comparison  of  Risks  and  Bounds  Geometric  Distribution 


L = Cramer-Rao  Lower  Bound 
V = Variance  of  the  Unbiased  Estimator 
R = Risk  of  the  Maximum  Likelihood  Estimator 
R = Lower  Bound  for  the  Risk. 

D 


Since  R > R^  a sharp  bound  still  has  not  been  found.  There  remains  the 
possibility  that  another  estimator,  with  a different  bias  function,  may 
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have  a variance  that  matches  its  lower  bound  (2.17).  It  is  curious  that 
p is  not  uniformly  better  than  T. 

Returning  to  unbiased  estimators,  the  classical  Cramdr-Rao  bound  can 
be  improved  by  the  method  of  Bhattachrayya . Here  A is  the  covariance 
matrix  of  the  {S  }.,  of  (2.26)  and  N is  a vector  of  zeros  except  for 
unity  in  the  first  position,  by  (2.28).  By  (2.19)  the  lower  bound  is 

Lk  = N'A_1N  (2.35) 


and  these  are  nondecreasing  in  k.  In  [11,  p.  2-23]  it  is  shown  that 


2 

* p q (Cram£r-Rao) 
L2  = p2q(l  + q) 


(2.36) 


2 k-1 

Attention  has  been  brought  to  the  notion  that  if  = p q(l  + q + •••  + q ) 

then  the  sequence  {L^}  would  converge  to  pq  and  the  bound  would  be  sharp. 

It  is  shown  in  the  appendix  that  does  possess  the  form  speculated. 


III.  Concept  of  Efficiency 

When  two  estimates  of  the  same  quantity  have  unequal  risks,  the 
ratio  of  the  smaller  risk  to  the  larger  one  is  called  the  efficiency  of  the 
latter  estimator  with  respect  to  the  former  one.  The  usefulness  of  this 
measure  presumes  that  the  distributions  of  the  two  statistics  have  roughly 
similar  shapes  and  the  risks  are  (approximately)  inversely  proportional 
to  the  sample  size.  Thus  the  efficiency  can  be  interpreted  as  the  ratio 
of  sample  sizes  needed  so  that  the  two  statistics  could  estimate  the 
parameter  equally  well. 

11 
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Often  the  Cramer-Rao  lower  bound  is  used  in  place  of  an  estimator 
to  serve  as  the  standard  of  comparison.  This  is  satisfactory  provided 
the  bound  is  sharp. 

A useful  extension  of  the  idea  of  efficiency  to  the  multiple 
parameter  case  appears  in  [5]  and  utilizes  the  concept  of  an  ellipse  of 
concentration.  To  explain,  let  p = 2 and  consider  an  estimator 
of  (6^,62)  and  let  B be  the  covariance  matrix  as  before.  The  ellipse 
of  concentration  is  that  ellipse  in  the  plane  centered  at  (82,82) > which 
serves  as  the  positive  sample  space  of  uniformly  distributed  random  vari- 
ables that  have  the  same  covariance  matrix  B.  The  efficacy  of 

this  lies  in  the  standardization  of  geometrical  shape.  The  comparison  of 
two  estimators  and  (T^.T^)  is  accomplished  by  comparing  their 

ellipses  of  concentration. 

Convariance  matrices  are  often  (roughly)  inversely  proportional  to 
sample  size.  Then  the  concept  of  efficiency  being  the  ratio  of  sample  sizes 
required  to  perform  the  same  job  is  preserved  if  the  determinants  of  the 
covariance  matrices  are  compared,  that  is,  the  squared  area  of  the  ellipses 
of  concentration.  This  extends  obviously  to  the  general  multiparameter 
case,  the  ratio  of  determinants  being  the  ratio  of  squared  contents  of 
hyper-ellipsoids  of  concentration. 

The  following  lemma  is  useful  in  defining  multiparameter  efficiency 
with  respect  to  a standard. 

Lemma  3.1.  Let  q = p and  B (as  well  as  A)  be  positive  definite. 

2 

If  u'Auv'Bv  >_  (u'v)  for  all  u,  v,  then  ] AB  | ^ 1 (where  the  vertical  bars 
denote  determinant) . 
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Proof : (Dan  Davis).  Let  P be  a similarity  transformation  that  diagonalizes 
A,  (P'AP  = D,  P'P  = I).  The  hypothesis  becomes  (upon  replacing  u with  Pu) 


u'Duv'Bv  > (u'P'v)2 


and  in  this  expression,  let  us  also  replace  v with  Pv,  yielding 


u'Duv'P'BPv  > (u'v)‘ 


(3.1) 


Since  A is  positive  definite,  the  diagonal  elements  of  D are  positive 

-1/2  -1/2  1/2 
and  D exists.  Let  u = D w and  v = D z.  Thus  (3.1)  becomes 

w’wz’D1/2P'BPD1/2z  > (w ' z) 2 (3.2) 


1/2  1/2 

Let  C = D '“P'BPD  7 and  note  that 


I cl  = |d1/2| !d1/2| | B | = |a| | B I = I ab| 


and  the  proof  will  be  completed  when  we  have  shown  that  all  the  eigenvalues 
of  C are  _>  1*  The  inequality  (3.2)  is  preserved  if  C is  rotated  to 
diagonal  form.  The  diagonal  elements  are  the  eigenvalues.  Let  w = z 
= { 6^^  > j =1  * tllat  is’  the  unit  vector  in  tlie  component  direction.  It 
follows  from  (3.2)  that  the  ith  eigenvalue  of  C is  _>  1.  This  is  true 
for  each  i = l,...,p.  Q.E.D. 

The  hypothesis  of  Lemma  3.1  is  not  always  met  in  a setting  of 
regular  estimation.  Let  us  examine  this  question  for  regular  estimators. 


Consider  the  vector  equation 


E{T)  = 9 + b(9) 


where  b is  the  vector  of  bias  functions.  Taking  the  partial  derivative 
of  the  ith  component  of  (3.4)  with  respect  to  9^.  produces 
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or 


a i 

gf“  / T.(x)  L(x;0)dx  = / T±(x)  SrL(x;9)dx  = «ir  + 


(3.5) 


N = Cov(S.T) 


(3.6) 


Since  (2.2)  is  valid  in  general,  the  hypothesis  of  Lemma  3.1  is  met  when 
T is  unbiased,  and  also  in  some  cases  when  the  bias  functions  decrease 
in  appropriate  ways. 

For  cases  of  unbiased  regular  estimation,  we  can  set  A = nA  and 
define  the  efficiency  of  the  multiparameter  estimator  T to  be 

E££(I>  ' • «-7> 


IV.  Asymptotic  Covariance  Matrices 

We  begin  with  two  lemmas  that  are  wholly  mathematical  in  nature. 
Suppose  the  two  quadratic  forms  A and  B are  both  positive  definite 
and  have  certain  submatrices  in  common.  Use  row  and  column  permutations, 
if  necessary,  and  assume  the  partitioned  form. 


E F 

" E F 

A = 

and  B = 

F'  G 

_ F'  H _ 

where  E is  q by  q,  G is  p-q  by  p-q,  F is  q by  p-q . This  structure  is 
most  useful  when  (4.1)  is  the  most  extreme  such  representation,  that  is, 
no  row  of  F',  G is  equal  to  any  row  of  F',  H. 
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Lemma  4.1.  The  rank  of  A * - B 1 is  < p-q. 


Proof:  The  expressions  for  inverting  partitioned  matrices  [7,  p.  165]  allow 

the  representations 


-E-1FW 


-G-1F'X  W 


-H_1F'Y 


-e_1fz 


(4.1’) 


where 


= [E-FG_1F ' ]_1  = E_1  + E-1 


FWF ' E_1 


Y = [E-FH_1F ' ] _1  = E_1  + E^FZF’E"1 
W = [G-F,e'1F]"1  = G-1  + G_1F'XFG_1 
Z =»  [H-F'E-1?]'1  = H_1  + h'VyFH-1 

and  E,  G,  H,  W,  X,  Y,  and  Z are  symmetric. 


Using  the  form 


A_1  - B_1 


-E"1F(W-Z) 


-g~1f'x+h”1f'y 


(4.2) 


note  that  the  first  q rows  of  (4.3)  will  be  represented  as  a linear  combi- 
nation of  the  last  p-q  rows,  as  soon  as  it  is  exhibited  that 

e“1F[G-1F'X-H_1F,Y]  = X-Y  . (4-4) 

Using  the  symmetry  of  A-1  and  B_1,  (4.1),  the  left  member  of  (4.4)  can 


be  written  as 


e-1f(w-z)f’e~1 


and  the  fact  that  this  is  X-Y  can  be  seen  by  subtracting  the  right  members 
of  the  first  two  expressions  in  (4.2).  Thus  the  rank  of  A - B is 


no  more  than  p-r. 
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Lemma  4.2.  The  rank  of  A 1 - B 1 is  p-r  if  G-H  is  invertible. 


Proof:  It  is  sufficient  to  show  that  (W-Z)  exists.  Using  the  identity 


(W-Z)  = Z(Z-1-W  1)W 


it  is  seen  that  (W-Z)-1  = W-1(Z-1-W-1)-1Z-1,  or,  using  (4.2) 


[G-F'E-1F] [H-G]-1[H-F’E-1F] 


which  exists  if  H-G  is  invertible. 

The  structure  treated  above  occurs  when  A = A,  the  information 

matrix,  and  B = M where  — M + o(— ) is  the  covariance  matrix  of  estimators 

n n 

resulting  from  a system  that  has  some  but  not  all  of  the  likelihood 
equations.  We  are  dealing  with  the  asymptotic  forms. 

It  is  assumed  that  the  estimation  equations  have  the  form 


g(x,0)  = 0 (4.8) 

where  g = (g^,..*,g  ) and  each  component  is  a symmetric  function  of  x, 
i.e.  is  invariant  under  permutations  of  x.,...,xn.  In  order  for  the  system 
(4.8)  to  have  a unique  solution  9,  it  is  necessary  (by  the  Implicit  Function 


Theorem  [16])  that  the  Jacobian 


*«1  8t 


Moreover,  we  require 


E{g(x,9)>  = 0 


a property  acquired  by  manipulation. 

For  convenience  of  analysis,  it  is  further  assumed  that  the  estimating 
equations  (4.8)  have  been  scaled  so  that 


16 


(4.10) 


c . 

Var{g . (x,9) } * + o (“) 

n n 


for  some  positive  constants  c^,...,c  . Moreover,  for  our  purposes  is  it 

assumed  that  the  (gj(x,9)}  have  bounded  continuous  partial  derivatives 

with  respect  to  9^ 9^  and  that  9,  resulting  from  the  solution  of 

(4.8)  is  consistent.  Hence  the  estimate  is  asymptotically  unbiased. 

Reference  [10]  contains  a deep  treatment  of  the  general  question  of 

the  existence  of  consistent,  asymptotically  normal  estimates.  There,  the 

functions  {g  are  averages  of  the  form  — J11  g (x. ,9)  and  this  structure 
1 1 n i=l  j 1 

precludes  much  of  what  has  been  assumed  so  far.  Indeed,  all  of  the  examples 
treated  thus  far  can  be  cast  in  this  average  form.  The  goals  of  the  present 
work  are  much  less  pure  than  those  of  LeCam,  and  the  question  of  verifying 
the  consistency  of  9 is  left  to  the  applier. 

It  is  noted  that  the  equations  for  maximum  likelihood  estimation 
can  be  cast  into  this  structure. 

Finally,  let  A(x,9)  be  the  p by  p matrix  of  partial  derivatives 


{9g^/99^}  and  assume  that 


Aik(x,9)  -*■  E 


(ft) 


(4.11) 


as  n -*■  <»,  and  the  resulting  limit  matrix  will  be  denoted  by  A = A(9)  . 
The  assumptions  allow  the  first  order  expansion 


g(x, 9)  = g(x,9)  + A(x,  9 + P (9-9) ) (9-9) 


(4.12) 


where  p is  a diagonal  matrix  of  random  numbers  belonging  to  the  interval 
[0,1].  Since  the  system  is  soluble,  g(x,9)  = 0 and  we  can  write 


(A. 13) 


\ 

i 


i 

j 

: 

i 


L 


(6-0)  = -A_1(x,  0 + p (9-0) ) g(x,6) 

The  continuity  of  A implies  that  of  g and  of  A Letting  the  asymptotic 
covariance  matrices  be  defined  by 

M = limit  nE(0-0) (0-0) ' (4.14) 

C = limit  nE{g(x,9)  g'(x,0)}  (4.15) 

it  follows  that 

M=A-1C(A"1)'  (4.16) 

When  g = 0 is  the  set  of  likelihood  equations,  i.e.  from  (2.12) 

— S =0  for  r = l,...,p  (4.17) 

nr 

it  is  well-known  (and  easily  verified)  that  M of  (4.14)  and  (4.16)  is  A 
(the  inverse  of  the  information  matrix),  C of  (4.15)  is  A,  and  A of 
(4.11)  is,  by  (2.11), 

E{s..  (X) } = -A  (4.18) 

lk 

which  is  symmetric  in  this  case. 

Now  let  us  suppose,  without  loss  of  generality,  that  only  the  first 
q equations  of  (4.17)  are  used  in  the  system  g = 0.  Let  us  denote  this 
subset  by  the  symbols  p = 0,  and  let  the  remaining  p-q  equations  be  h = 0. 
Thus,  in  partitioned  form,  (4.8)  becomes 

8 = {h}  = 0 (4.19) 


All  assumptions  are  met  and  we  can  proceed  formally 


(e(uu') 

E(uh')  ) | Cu  Cu  | 

C = limit  nE{gg'  } = limit  n J 

(4.20) 

( E(hU') 

E(hh’)  ) ( C22  ) 
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The  informa  Cion  matrix  can  be  partitioned  similarly 

( A11  A12  | 


( a21  a22  j 


(4.21) 


where  A.^  is  a q by  q matrix.  The  following  result  is  obvious: 


Lemma  4.3.  = A^. 

Let  us  define  a p-q  by  q matrix  g21  whose  elements  are  EOh^/36^} 
for  j = 1 , . . . , p-q  and  k = l,...,q. 


Lemma  4.4.  = -g2^. 

Proof:  From  (4.20)  it  is  seen  that  the  (j,k)th  element  of  C21  is  the 

limit  of 


n 1 3 In  f(Xi,0)  _ ^ ^ ) -bln  f(X.9)  u ,v  aN) 

i=ll  39k  3(  ’ } i I 39j  J ’I 


by  the  interchangeability  property  of  the  function  tu  with  respect  to  the 

X = (X. , . . . ,X  ) . Now  the  assumptions  imply  that  the  equation  0 = E[h  (X,6) ] 
1 n 

= / h.(x,9)  exp{[”_1  In  f(xi>0)}  can  be  differentiated  with  respect  to 
each  0;  under  the  integral  sign.  So  doing  produces 

( 3h,(X,0)  ) ( n 31n  f(x  ,0)  ) 

0 = E -L +E  l ^ h (X,0) 

| 39k  1 I i-1  39k  3 \ 

which  is  the  desired  result. 
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Lemmas  4.3  and  4.4  provide  the  representation 


C - j #11  ~%il  j (4. 

' -g21  C22 

for  the  asymptotic  covariance  matrix  of  g'  = (p',h’).  It  follows  rather 

quickly  from  (4.11)  and  the  discussion  accompanying  (4.18)  that 


( "All 

-A12  ) 

A = , 

\ 

( 

(4 

1 821 

s22  ) 

where 

g22  is  defined  to  complement 

g21  and  has  elements 

E{3h./30.  } 
3 K 

for  j 

= 1, . . . ,p-q  and  k = q+1,... 

>P- 

-1 

Using  (4.22)  and  [7]  we  can 

characterize  C as 

1 1 Gl1 

G12  ) 

C = 

( 

( 

(4 

( Si 

G22  ' 

where, using  g12  = g^, 

G11  = [A11  * S12C22S211  + A11812G228211A11 
G12  = G21  ” A11812G22  = G11812C22 
G21  = G12  ” C22821G11  = G22821A11 
G22  = [C22  - 821A118121  = C22[I  + 821G11812C22] 


(4. 

(4. 

(4. 

(A. 


Theorem  4.1.  If  the  first  q equations  of  g = 0 are  likelihood  equations, 
then 


_ i 


A11  A12 


where 


H “ A21G11A12  " A21G12822  ~ (A21G12822) ' + 822G22822 


Proof : From  (4.16)  we  have 


M 1 = A'C  1A 


(4.29) 


Multiplying  the  partitioned  matrices  (4.23)  and  (4.24)  followed  by  applying 
the  relationships  (4.25)  and  (4.26)  yields  the  product  form 


I 


C A = 


'I  G12822  " G11A12 


| 0 ^22822  " G2l"l2  ^ 


and  draws  attention  to  the  relationships 


(4.30) 


G11A11  " G12821  “ 1 


(4.31) 


G21A11  " G22821  “ ° 


(4.32) 


Multiplying  (4.30)  on  the  left  by  A'  from  (4.23)  produces  the  intermediate 
form 

| A11  A11^G11A12  “ G12822 ^ + 812^G22822  ~ G21A12^  j 

( A21  A21 -G11A12  ~ G12822 ^ + 822^G22822  ” G21A12^  J 


Apply  (4.31)  and  (4.32)  in  the  forms  = I + g12G2i  and  AliGio=  812G22 

to  the  terms  in  the  upper  right  corner  yield  the  reduction  to  A^.  The 
terms  in  the  lower  right  need  only  be  rearranged  and  the  form  of  the  transpose 
recognized.  Q.E.D. 

Theorem  4.1  can  be  used  to  structure  the  computation  of  M This 

computation  is  particularly  easy  when  A = 0 = A22-  Then  the  main  effort 
goes  into  producing  G22  from  (4.28).  Otherwise  alternative  forms  for  the 
lower  right  submatrix  of  M ^ may  be  useful.  Once  G02  is  made  available, 

then  the  products  A21G128?2  = A21A1181',G2'’  using  (4.26).  Then  is 

obtained  from  the  right  number  of  (4.25).  Assembling  the  results  produces 


o g 2 — A A d o G 2 G 2 A A 

s22  22822  21  11812  22822  s22  22®21  11  12 


+ A21A11g12G22g21A11A12  + a21A11A12 


(4.23) 


for  this  pesky  submatrix. 


Corollary  4.1.  If  p = 2 and  q = 1 then 


-1 


11 


'12 


12 


A11822  “ 2g12822A12  + C22A12 


A11G22  " g12 


proof.  Under  this  hypothesis  all  the  submatrices  of  M C,  A are 
scalars.  Using  (4.22)  and  (4.28)  one  obtains  G 22  = | C | where 

| C | = det  C,  which  when  used  in  (4.33)  will  verify  the  lower  right  corner 
after  reduction. 
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The  matrices  A and  M have  the  substructure  of  (4.1).  By 
Lemma  4.1  the  rank  of  A - M ^ is  no  more  that  p-q  and  will  be  equal 
to  p-q  if,  according  to  Lemma  4.2,  the  following  matrix  is  invertible: 


V22  “ A21G11A12  + A21G12822  + ^A21G12822^ 


- 822G22822 


Use  of  (4.33)  can  be  made  to  express  (4.34)  as 


A22  - A21A11A12 


[g22  " g21AllA12^'  G22^822  “ 821A11A12^ 


One  might  expect  that  the  characterization  presented  by  Theorem  4.1 
can  be  generalized  to  two  different  estimation  systems,  = 0 and  g ^ = 0, 
that  have  some  equations,  p =0,  in  common.  That  is,  neither  system  is 
assumed  to  be  the  maximum  likelihood  system.  In  such  a generalization,  one 
would  expect  the  two  inverse  covariance  matrices,  and  to  have 

submatrices  in  common.  In  general  this  is  false.  A counter  example, 
involving  the  gamma  distribution,  is  presented  in  [15,  Sec.  4].  The 
fundamental  reason  is  that  Lemma  4.4  is  not  available. 


V.  Directional  Efficiency 


Combining  (3.7)  and  (4.14)  lead  to  the  multiparameter  definition 
of  asymptotic  efficiency 


Eff(e) 


m 


(5.1) 


Since  maximum  likelihood  estimates  are  efficient,  consistent,  and 
asymptotically  normal,  that  is 


X 


/n  (9-9)  — > N(0,A_1) 


(5.2) 


the  expression  (5.1)  represents  the  comparative  rate  at  which  squared  volumes 
of  the  ellipsoids  of  concentration  of  9 converges  to  zero.  That  is,  the 
matrix  of  that  ellipse  is  roughly  M \ and  |/\  ^1  represents  the  best 
rate  at  which  it  car.  shrink  to  zero  as  n 00 . 

The  estimate  0 has  been  presented  as  a surrogate  for  9,  the 
maximum  likelihood  estimate,  which  may  be  too  hard  to  find.  The  efficiency 
of  9 will  depend  on  two  choices:  the  number  q of  likelihood  equations 
retained  and  the  quality  of  the  replacement  equations.  The  following  concept 
of  directional  efficiency  may  be  useful  in  examining  these  choices.  The 
quantities  v'0  and  v'9  are  competing  estimators  of  the  linear  combination 
v'9.  The  vector  v specifies  a direction  in  the  parameter  space.  The 


Limit 

n -*■  00 


Var(v'9) 

Var(v'0) 


v’A  ^v 
v’Mv 


e 


(5.3) 


is  the  (one-dimensional)  asymptotic  efficiency  in  the  direction  v. 

The  invariance  feature  of  maximum  likelihood  estimates  insures  that 
0 <_  e _<  1 for  all  v.  Let  us  relate  these  directional  efficiencies  to 
the  multivariate  efficiency. 
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The  right  portion  of  (5.3)  may  be  rewritten  as 

v'(eM  - A ^)v  = 0 (5.4) 

and  serves  to  define  e implicitly  as  a function  of  v.  Since  9 is 
asymptotically  unbiased,  we  know  from  (2.5)  and  (2.28)  that  N = I and 

v'  (M  - A_1)v  >_  0 (5.5) 

for  all  directions  v.  Thus  the  directional  efficiency  e tells  us  how 
much  M must  be  scaled  down  in  order  to  produce  zero  in  the  direction  v. 

To  characterize  the  critical  values  of  e(v)  let  us  set  the 
gradient  of  e equal  to  zero.  This  results  in  the  system 

(e(v)M  - A ^)v  = 0 (5.6) 

Since  v ^ 0 and  e(v)  is  a scalar  it  is  seen  that  there  are  p solutions 
(with  possible  multiplicities)  to  the  equation  |eM  - A = 0.  These 
critical  solutions  obviously  satisfy 

1 > e,  > e„  > •••  > e >0  (5.7) 

- 1 — 2 — - P 

and,  by  the  theory  of  simultaneous  reduction  of  two  quadratic  forms  [12], 
to  each  e^  is  associated  a critical  direction  v^  by  solving  (5.6)  when 
e(v)  is  set  equal  to  e^.  It  follows  that 

ep  £ e(v)  £ e!  (5.8) 

as  v varies  over  the  p-dimensional  sphere  and  there  exist  directions  of 

greater  and  lesser  efficiency.  Moreover,  the  critical  efficiencies  are 
the  eigenvalues  of  M ^A  \ because  (5.6)  is  equivalent  to  (Ie-M  ^A  ^)v=  0. 
Upon  multiplying  them,  it  follows  that 
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p 


M 


Eff (e) 


(5.9) 


from  (5.1).  Thus  the  multiparameter  efficiency  is  the  product  of  the 
critical  directional  efficiences,  and  is  at  least  as  small  as  any  of  them. 

Let  us  apply  this  material  to  that  of  Section  IV.  Suppose  the 
first  q equations  of  g = 0 are  the  first  q likelihood  equations. 
Clearly  |eM  - A = 0 is  equivalent  to 

|eA  - M-1|  =0  (5.10) 

and  the  application  of  Theorem  4.1  shows  that  the  root  e = 1 appears  at 
least  q times.  Thus,  the  directional  efficiencies  of  0 are  unity  in  a 
q dimensional  subspace  of  0. 

The  above  provides  some  quantification  for  the  notion  that  q 
should  be  as  close  to  p as  possible.  Turning  to  the  question  of  measuring 
the  quality  of  the  replacement  equations,  the  values  of  eq+i,,,->ep  and 
their  associated  directions  may  prove  useful. 

We  close  with  an  example  that  illustrates  the  above  features. 
Consider  a gamma  density 

a-1  -x/B 


f (x;a,B)  = 


r (a)  8 


0 < x < °°,  0<a,  0<8 


(5.11) 


The  properties  (5.12)  thru  (5.16)  are  developed  in  [15,  Sec.  4]: 
The  maximum  likelihood  equations  are 

x - aB  = 0 (5.12) 


In  x - In  B - i^(a) 


where  <|>(a)  is  the  psi  function.  The  information  matrix  is 


"°2I6 


S2i|) 


'(a)  I 


(5.13) 
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and  the  identification  is  0X  = 6,  02  = a.  Since  (5,12)  cannot  be  solved 

explicitly,  one  often  retreats  to  the  method  of  moments,  whose  equations 

2 

are  (using  s~  for  the  sample  variance) 

x - at?  =0 

22  (5-1 

s -aB  = 0 

which  can  be  explicitly  solved.  Note  that  (5.14)  shares  an  equation  with 
(5.12)  and  Theorem  4.1  applies.  In  fact, 


(. 

h-1  -4 


B2  ) 2ct  + 3 

\ B a 2(a+l) 


Also,  the  efficiency  of  the  method  of  moments  is 


Eff (9) 


2 (a+1)  (aip ' (a)-l) 


Since  p = 2 and  q = 1 in  this  case,  the  root  e = 1 appears 
once  and  the  other  root  can  be  obtained  using  (5.16)  in  (5.9).  Thus 


6 = 1 
ei  1 • 


'2  2(a+l)  (a\J)'  (a)  -1) 


and  one  can  proceed  to  calculate  the  corresponding  directions.  The  result 


vi ' 9 * 


V = ( ) 
2 


Full  efficiency  is  available  in  the  first  direction,  i.e.  when  estimating 


v^9  = 2a8 
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or  any  scalar  multiple  thereof.  Thus  the  product  aB  is  efficiently 
estimated  and  this  is  not  surprising.  It  is  the  quantity  shared  by  the 
two  systems  (5.12)  and  (5.14). 

The  minimum  efficiency  is  available  in  the  second  direction, 

v^0  = a 

Thus  the  estimate  of  the  shape  parameter  suffers  most  when  (5.14)  is  used. 

It  seems  useful  to  consider  Figure  5.1,  which  contains  the  ellipses 
of  concentration  related  to  the  maximum  likelihood  estimate  (5.12)  and 
the  moment  estimate  (5.14).  The  inner  ellipse  corresponds  to  maximum  like- 
lihood and  lies  entirely  within  the  outer  since  M 1 - A >_  0.  They  come 
together  and  touch  at  the  two  points  on  the  S axis,  by  Theorem  4.1.  The 
marginal  distributions  obtained  by  projecting  all  of  the  probability  mass 
onto  a line  of  given  direction  will  have  maximum  discrepancy  if  the 
projection  takes  place  on  the  vertical  axis.  The  two  projections  will 
coincide  if  done  in  the  direction  (a,B),  e.g.  3,  2.5  in  this  case. 
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APPENDIX 


Some  Properties  of  the  Geometric  Distribution 


The  geometric  distribution  has  density 


f(x;p)  = pq  for  x = 0,1,. 


for  a random  variable  X representing  the  number  of  failures  preceding  the 
first  success  in  a series  of  Bernoulli  trials.  It  is  well  known  that 


u = p/q 


2 , 2 

o = q/p 


The  maximum  likelihood  estimator  for  p is 


and  its  first  two  moments  may  be  characterized  as  follows: 


00  . 00  x+1  00  q 

e<»  ■ I it  pix  - f I tr  - f / »x<i» 

0 i+x  q o X+1  q o o 


; C \ ■ f i - a - - f 


0 00  , 00  x+1  00  q 1 

E(p  ) = l 2 pqx  - ^ 1 2 = ^ 1 I t J uXduJvXdv 

0 (1+x)  q 0 (x+1)  q 0 0 0 

= £ jq  j1  = - £ Jq  m(l-v)  & 

' 1 1-uv  q ' v 


q 00  j-1  “ i 

J I 

q 0 1 J q 1 j2 


The  bias  function  is 


(A. 6) 


which  is  uniformly  positive  and  has  derivati 


b’(p)  = — ipfp)  _ i+a 

2 q 

q 4 


which  decreases  monotonically  and  changes  sign  at  about  p = .3162.  The 
maximum  bias  is  .216.  The  log  likelihood  function  has  derivative 


and  variance 


S = - - 

1 p q 


E<si>  ‘ -T- 

qp 


To  develop  the  sequence  of  Bhattacharyya  bounds,  one  must  first 
characterize  (2.26) 


S = i ^ f(x;p) 

r f(x;p)  r 

3p 


(A. 10) 


Using  the  notation 


= x(x-l)  •••  (x-r+1) 


for  factorials,  one  can  verify 


(A. 11) 


3 f , , N r-1  rx^r  ^ v r-r' i 

„pr  = (-1)  p f (x-r+1)  + (-l)r  x(r)  f(x-r) 


(A. 12) 


by  induction.  Dividing  (A. 12)  by  (A.l)  yields 


s =i^)— {X(r)  _^x(r-1)} 

r r 1 p 

q 


(A. 13) 


and  no  end  corrections  are  necessary.  The  covariance  matrix  calculations 
requires  joint  factorial  moments. 

Consider  the  generating  function 


G(u)  - E(uX)  - ^ 


(A. 14) 


Differentiating  r times  allows 


(r)v  _ r.'qjL 


E(XVW)  = 


(1-qu) 


r+1 


= r!  (% 


u=l 


(A. 15) 


In  similar  fashion  the  product  moments  X^S^X^r^  can  be  obtained  from 


E(X(s)X(r))  - 3r+-^(uV-} 


3vS  3ur 


(A. 16) 


i=V=l 


using  the  product  argument  uv  in  (A. 14).  One  can  proceed  as  follows. 
Starting  with 


„r„,  . , r r 

3 G(uv)  _ r : pq  v 


3u  (1-quv) 


r+1 


(A. 17) 


one  can  continue  with  the  Newton  differentiation  formula 


<fg)(s)  = i o f^'g 

j=0 


sN  c(j)„(s-j) 


(A. 18) 
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where  in  (A. 18)  the  affixes  refer  to  derivatives.  For  j = <_  r 

the  jth  derivative  of  v is 


r . r-i 
(r-j):  v 


“T—1 

and  the  (s-j)th  derivative  of  (1-quv)  is 


(A. 19) 


(r+s~j) ; 

„ I 


(qu)S  3 (1-quv)  r-1  S+3 


(A. 20) 


To  collect:  To  obtain  (A. 16)  one  needs  s derivatives  of  (A. 17)  with 
respect  to  v.  These  are  provided  by  inserting  (A. 19)  and  (A. 20)  into 
(A. 18)  and  setting  u = v = 1.  Thus  for  s <_  r 


E(X(s)X(r)) 


- DQr  y (s)  — T.-l — (r-s-j)-'  a 

j=0  3 r<  (1-q 


s-j 


■ r:  s;  (i)r+S  i (£)3  (*) 

P j=0  q 3 s 


(1-q) 

) 


r+l+s-j 


(A. 21) 


which  appears  to  be  the  most  convenient  form. 

Returning  to  the  question  of  product  moments  for  (A. 13),  express 


S S ’ 
s r 


Xill {x<r)x(s)-*  [rX(r  1)X(s)+  sX(r)X(s-1)  ] + rs(a) 

r+s  P P 


x(r-l)x(s-l)} 


(A. 22) 


Applying  the  expectation  operator  to  this  using  (A. 21)  produces,  for 
s < r 
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e<ssv  - (-1C r:  s!  4 <;>  (^3>  - 1 <t>J  (fx1^;3) 

p 0 J0  J J 

- T ^T’^Yr1)  + T <*>3  <*;1><r£-j3)1 

(A. 23) 

This  can  be  reduced  by  applying  the  formula  + = (n*^)  appro- 


s  j 

r-. 


priately.  Thus 


,r+s-j.  .r-l+s-j. 

V r-j  ' ^ r-l-j  ' 


.r+s-l-j,  .r+s-2-j. 

^ r-j  ^ r-l-j  ; 


,r-l+s-j, 

r-j 


(r+s-2-jj 


can  be  inserted  into  that  part  of  (A. 23)  which  is  enclosed  in  braces,  pro- 
ducing the  intermediate  form 


^0  (q)J  [(j)(r"r-j"j)  “ (Sj1)(r+rIj"j)]  + (q)S  [(r-s)  " (r-l-s) ] (A'24) 


The  differences  of  products  of  binomial  coefficients  can  be  combined  in 
a straightforward  way  so  that  (A. 24)  becomes 


Y <r+Tf  J>  i7^rTLrr+  <Y  C'b 

q J s-l  s (r+s-l-j ) q r-s 


Then  one  assembles  the  final  form,  for  s r 

rf~  „ , (-Dr+S  r.a.  ! (£.\ j (S,  , r+s-l-j,  rs^j 

E(SsSr)  ' 7+s  r'S-  * q}  Y(  s-l  ) s (r+s-l-j) 
p J=u 


(A. 25) 


1 


The  product  moments  (A. 25)  are  covariances  since  each  E(S  ) = 0. 


Because  of  the  structure  of  (2.35),  the  bound  is  the  leading  element 


-1 


of  A . Let  us  use  the  notation  X^  = E(SrSg)  and  apply  (A. 25)  to  show 


(-l)r+1  r! 


lr 


qp 


r+1 


for  r > 1 


. ,r+2  , 

i - (~1)  2r I . . 

X2r  2 r+2  (r_1+q^ 

q P 


for  r > 2 


and  that,  for  k = 3, 


2 

p q 


3 

p q 


4 

p q 


A- 


-2 


3 

p q 


4 ( 1+q ) 


4 2 

p q 


-12 ( 2+q) 

5 2 

P q 


6 

4 

P q 


-12 (2+g) 
5 2 

p q 


36(l+4q+q  ) 

6 3 

P q 


Direct  calculations  show  that  the  cofactor  of  X ^ is 


14.! 


10  5 

P q 


(l+q+q“) 


and  that 


A = 


144 


12  6 
P q 


implying  the  desired  form  for  • 


(A. 26) 


(A. 27) 
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